Goto

Collaborating Authors

 scalable bayesian logistic regression


Coresets for Scalable Bayesian Logistic Regression

Neural Information Processing Systems

The use of Bayesian methods in large-scale data settings is attractive because of the rich hierarchical models, uncertainty quantification, and prior specification they provide. Standard Bayesian inference algorithms are computationally expensive, however, making their direct application to large datasets difficult or infeasible. Recent work on scaling Bayesian inference has focused on modifying the underlying algorithms to, for example, use only a random data subsample at each iteration. We leverage the insight that data is often redundant to instead obtain a weighted subset of the data (called a coreset) that is much smaller than the original dataset. We can then use this small coreset in any number of existing posterior inference algorithms without modification.


Reviews: Coresets for Scalable Bayesian Logistic Regression

Neural Information Processing Systems

Note that in minibatch inference methods, at each iteration, a small subset of the data is sampled from the full dataset and used to make an update; these methods take advantage of the redundancy in data to perform inexpensive updates. In this paper, coresets reduce the total dataset size by, in some sense, approximating the dataset with a smaller group of (weighted) examples. However, when coresets are used in existing inference algorithms (such as these minibatch algorithms), it seems to me that a very similar procedure will occur: a small subset of this approximate, weighted dataset will be drawn, and used to make an update. I am not convinced this would actually speed up inference (i.e. In a sense, I feel that the main thing happening here is that the data is approximated in a smaller/compressed fashion; I can see how this might help with data storage concerns, but I don't see a great justification for why it would appreciably speed inference over existing minibatch methods (especially considering a coreset must be constructed before inference can proceed, which adds additional inference time to this method). One way to demonstrate this would be with timing comparison plots that explicitly show that coresets yield faster inferences given large datasets when compared to minibatch methods---however, no direct experiments of this sort are given.


Coresets for Scalable Bayesian Logistic Regression

Huggins, Jonathan, Campbell, Trevor, Broderick, Tamara

Neural Information Processing Systems

The use of Bayesian methods in large-scale data settings is attractive because of the rich hierarchical models, uncertainty quantification, and prior specification they provide. Standard Bayesian inference algorithms are computationally expensive, however, making their direct application to large datasets difficult or infeasible. Recent work on scaling Bayesian inference has focused on modifying the underlying algorithms to, for example, use only a random data subsample at each iteration. We leverage the insight that data is often redundant to instead obtain a weighted subset of the data (called a coreset) that is much smaller than the original dataset. We can then use this small coreset in any number of existing posterior inference algorithms without modification.